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Abstract 

Background: Noncoding RNA (ncRNA) lias been recognized as an important regulator of gene expression 
networks in Bacteria and Eucaryota. Little is known about ncRNA in thermococcal archaea except for the 
eukaryotic-like C/D and H/ACA modification guide RNAs. 

Results: Using a combination of /n silico and experimental approaches, we identified and characterized novel P. 
obyssi ncRNAs transcribed from 12 intergenic regions, ten of which are conserved throughout the Thermococcales. 
Several of them accumulate in the late-exponential phase of growth. Analysis of the genomic context and 
sequence conservation amongst related thermococcal species revealed two novel P. abyssi ncRNA families. The 
CRISPR family is comprised of crRNAs expressed from two of the four P. obyssi CRISPR cassettes. The 5'UTR derived 
family includes four conserved ncRNAs, two of which have features similar to known bacterial riboswitches. Several 
of the novel ncRNAs have sequence similarities to orphan OrfB transposase elements. Based on RNA secondary 
structure predictions and experimental results, we show that three of the twelve ncRNAs include Kink-turn RNA 
motifs, arguing for a biological role of these ncRNAs in the cell. Furthermore, our results show that several of the 
ncRNAs are subjected to processing events by enzymes that remain to be identified and characterized. 

Conclusions: This work proposes a revised annotation of CRISPR loci in P. obyssi and expands our knowledge of 
ncRNAs in the Thermococcales, thus providing a starting point for studies needed to elucidate their biological 
function. 



Background 

A plethora of noncoding RNAs (ncRNAs), including 
small RNAs that bind to proteins or base pair with target 
RNAs, have been found to operate at all levels of gene 
regulation ranging from the control of enzymatic activity 
to the regulation of the initiation of transcription and 
translation [1-3]. However, whole genome analyses of 
both prokaryotic and eukaryotic organisms have gener- 
ally disregarded ncRNA genes. In Bacteria, systematic 
searches for functional intergenic regions have led to the 
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discovery of more than 200 bacterial trans-acting 
ncRNAs of about 50 to 500 nucleotides mostly in £. coli 
[1,4,5], but also in other pathogenic species [2,6-8]. Func- 
tional analysis of these ncRNAs identified many of them 
as regulators of bacterial stress responses. Furthermore, 
the recent discovery of riboswitches [9] and RNA-based 
thermosensors [10] in bacterial 5' untranslated regions 
(5'UTRs) has enlarged the range of posttranscriptional 
control of gene expression. In Archaea, computational 
and experimental analysis lead to the identification and 
characterization of the homologues of the eukaryotic box 
C/D- and H/ACA guide snoRNAs, which are involved in 
modification and maturation of tRNAs and rRNAs in 
Crenarchaea and Euryarchaea [11-14]. 

RNomics studies have revealed the occurrence of 
stable antisense RNAs and the expression of ncRNAs 
from intergenic regions in Sulfolobus solfataricus and 
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Archeoglobulus fulgidus [15-17]. In silico searches using 
the Pyrococcus furiosus and Methanocaldo coccus jana- 
shii genomes, predicted several novel ncRNAs originat- 
ing from intergenic regions [18,19]. More recently, 
transcriptome analysis has revealed several dozen 
ncRNAs in the halophilic euryarchaeon Haloferax vol- 
cannii [20,21] and more than two hundred in the 
methanogenic crenarchaeon Methanosarcina maze'i [22]. 
Among the ncRNAs in Archeoglobulus fulgidus [15], 
Sulfolobus solfataricus, Sulfolobus acidocaldarius [16,23] 
and Pyroccocus furiosus [24], several correspond to lad- 
der-like transcripts issued from repeated genomic 
sequences or CRISPR loci, [25,26]. The CRISPR/Cas 
defense system (for review [27]), identified in most 
archaeal genomes as well as many bacterial genomes, 
provides acquired immunity against viruses and plas- 
mids by targeting nucleic acid in a sequence-specific 
manner [28,29]. The anatomy of a CRISPR locus has 
been defined as an array of short direct repeats of 20 to 
50 base pairs, often containing palindromic sequences 
[30,31]. Irrespective of the precise mechanism of the 
defensive action of CRISPR/Cas systems, there is a con- 
sensus that transcription of the CRISPR cassette initiates 
in or near the leader sequence followed by processing 
by the Cas proteins of the RNA precursor into frag- 
ments (crRNAs) corresponding to the interval of the 
repeats [24]. 

Several stable RNAs in the Archaea, including the box 
C/D and H/ACA guide RNAs, the ribosomal RNAs and 
RNase P, form ribonucleoprotein complexes (RNPs) 
with the multifunctional L7Ae protein that binds to the 
Kink-turn (K-turn) [17,32,33] or the related Kink-loop 
(K-loop) RNA structural motif [34]. These widespread 
motifs [35] provide a platform for assembly of RNPs or, 
in the case of the S-adenosylmethionine and lysine 
riboswitches, orient strands that base pair to form pseu- 
doknots [36,37]. 

The goal of this study was to identify and characterize 
novel ncRNAs in Pyrococcus abyssi, a thermococcal 
archaeon (hyperthermophilic and anaerobic member of 
the euryarchaeal phylum), which is one of the first 
archaea whose genome was sequenced [38]. Stable non- 
coding RNAs that have been identified in P. abyssi are 
ribosomal RNAs (1 16S, 1 23S, and 2 5S genes), RNase 
P (1 gene), tRNAs (46 genes), 7S RNA (1 gene), H/ACA 
guide RNAs (7 genes) and C/D guide RNAs (59 genes). 
Most of these genes have a significantly higher (G+C) 
content compared to the rest of the genome, which is 
AT rich. The (G+C) content of the P. abyssi genome is 
44% compared to 66% and 70% in rRNA and tRNA 
genes, respectively. Considering the availability of several 
related thermococcal genomes and the AT rich charac- 
ter of P. abyssi genome, we performed computational 
searches for novel ncRNAs. We clustered InterGenic 



Regions (IGRs) based on primary and secondary struc- 
ture features, and sequence conservation in other ther- 
mococcal genomes. Northern blotting showed that 24 
out of the 82 selected IGRs are transcribed. Additional 
primer extension and Circular Rapid Amplification of 
cDNA Ends (C-RACE or CR-RT-PCR) experiments and 
in silico analysis showed that twelve of these transcripts 
have characteristics of regulatory ncRNA: three are from 
CRISPR cassettes, three are from mRNA 5'UTRs and six 
are from intergenic regions. Altogether, this study 
allowed us to define two novel families of ncRNA in P. 
abyssi, the CRIPSR and the 5'UTR derived ncRNAs. 

Results 

Detection of novel ncRNAs 

The Pyrococcus abyssi IGRs were screened using comple- 
mentary approaches (Additional file 1, Figure SI). The 
first one, which takes into account specific characteristics 
of the P, abyssi genome, recovered 73 GC-rich regions in 
67 IGRs including 51 known ncRNA genes and 22 novel 
regions. The high number of known ncRNA genes found 
by this approach confirms the interest in using AT-rich 
genomes in computational search for ncRNA genes. The 
second comparative approach, which highlights sequence 
conservation between four closely related pyrococcal and 
thermococcal species, produced 106 regions similar to at 
least one region of another thermococcal genome. The 
106 regions cover 95 IGRs of the P, abyssi genome. Based 
on alignment features, 65 regions were selected manually 
from both sets of candidates as described in the Methods 
section. Fourteen regions encoding novel putative 
ncRNAs were found by both approaches (Additional file 
1, Figure SI). To refine our search, 73 regions predicted 
by both approaches were considered with regard to other 
criteria as described in the Methods section including 
free energy of RNA folding, presence of highly structured 
hairpins, and RNA motifs such as K-turns and K-loops 
[34,39] as well as similarities between the P, abyssi gen- 
ome and all other available archaeal genomes. Altogether, 
82 regions were selected as putative genes for ncRNA 
(Figure SI, Table SI). To test whether all these regions 
express detectable transcripts, we performed Northern 
blot analysis with strand-specific direct and reverse oligo- 
nucleotide probes (Additional file 2, Table SI), and RNA 
extracted from P, abyssi in exponential, entry into sta- 
tionary or stationary phase cultures. Signals were 
detected in 24 out of the 82 regions. Five regions gener- 
ated transcripts up to 700 nt in length (data not shown). 
They were excluded from further analysis since they were 
likely to correspond to regions within polycistronic 
mRNA. We focused on the 19 regions transcribing RNAs 
shorter than 500 nt. Seven regions corresponded to the 
H/ACA RNA genes recently annotated in the P. abyssi 
genome [13]. The remaining 12 regions were clustered 
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into four categories (Table 1). The first includes two 
CRISPR loci expressing three 60 nt ncRNAs (crRNAs), 
the second includes four unique loci in the P. abyssi gen- 
ome expressing RNAs ranging in size from 50 to 100 nt, 
the third includes three repetitive loci conserved 
throughout thermococcal genomes expressing RNAs ran- 
ging in size from 130 to 220 nt, and the fourth includes 
two repetitive loci specific to P, abyssi expressing RNAs 
ranging in size from 145 to 343 nt. More often than not, 
abundance of the expressed RNAs was growth-regulated 
with the highest amounts detected in entry into station- 
ary phase. In many cases, several RNA species arose from 
the same locus suggesting the processing of a primary 
transcript to secondary transcripts. 

Annotation and expression of P. abyssi CRISPR loci 

The comparative computational screen selected the four 
CRISPR loci annotated recently in the P, abyssi genome 
[40]. A careful sequence inspection of these cassettes 
revealed an erratic number of repeats and spacers in the 
P, abyssi CRISPR loci. Based on recent observations 
showing that new spacers are integrated mainly on the 
side proximal to the leader sequence of the CRISPR cas- 
sette and that degenerated direct repeats accumulate at 
the distal 3' end of the cassette [29], we revised the 
annotation of the P, abyssi CRISPRs (Figure lA). The 
CRISPR 1 (encoded on the reverse strand) and CRISPR 
4 (encoded on the sense strand) are composed of 23 



and 29 direct repeats and spacers, respectively, with 
similar 30 nt direct repeat sequences. The observation 
that the penultimate and ultimate 3' end-direct repeats 
are degenerate by one to six mismatches confirms the 
orientation proposed by the UCSC archaeal genome 
browser [41], which differs from the orientation pro- 
posed by [40]. The atypically short CRISPR 3 locus con- 
tains only four spacers with degenerate direct repeat 
sequences related to CRISPR 1 and 4. The CRISPR 2 
cassette encompassing eight spacers is distinguished by 
the sequence of its 29 nt direct repeats. In order to posi- 
tion the degenerate direct repeat (four mismatches) at 
the end of the CRISPR cassette, as in the majority of the 
CRISPR loci [27], we propose that CRISPR 2 is encoded 
by the reverse strand as opposed to previously reported 
annotations (UCSC archaeal genome browser and [40]). 

No signals were detected by Northern blotting with 
strand-specific direct or reverse probes against sequence 
matching spacers 2 and 7 of CRISPR 2 and spacer 2 of 
CRISPR 3 suggesting that these loci are not transcribed 
(data not shown). This tentative conclusion was con- 
firmed by more sensitive tests involving primer exten- 
sion analysis to map 5' ends of the CRISPR precursors 
and to identify promoters (see below). In contrast, 
probes against sequences matching spacer 1 of CRISPR 
1 and spacers 4 and 12 of CRISPR 4 allowed the detec- 
tion of transcripts in all three growth phases (Figure IB; 
Table 1). The approximately 60 nt length of CRISPR- 



Table 1 Novel ncRNAs validated In this study 



Name Beginning 


End 


length(nt) 


5'-start 


Promoter 


Adjacent genes 


Strand Conservation 


Prediction 


CRISPR locus 


Crl-1 149430 


147916 


60 


149408 


Cons (53 nt) 


PAB0095/PABt02 


All§ 


H/R 


Cr 4-4 1760062 


1761854 


60 


1 760285 


Cons (53 nt) 


PAB1170/PaBt46 




H/R 


Cr 4-12 




60 


1 760832 










Conserved unique locus 


sRkll 197248 


197534 


50 


197484 


TATA 


PAB2227/PAB2402 




H/R 


sRk28 527697 


527833 


70/100 


527798 




PAB1992/PAB1991 


<^<^<r- All except Tsi 


H/R 


sRk33* 636804 


636904 


100 


636804 


Cons 


PAB1916/PAB0465 




H/R 


sRk61 1348633 


1348700 


60/95 


1348615 




PAB1455/PAB0921 


< — >^ Pho 


R 


Conserved repetitive locus 


sRk48 985849 


986002 


130 


985805 


Cons 


PAB0686/PAB0686.1n 


-^^^ Pho, Pfu, Tko, Tsi 


H/R 


sRk49 1067535 


1067886 


150/220 


1067923 


TATA (60 nt) 


PAB0740/PAB0741 




R 


sRk52 1104008 


1104286 


130/190 


1104056 


Cons 


PABt30/PAB0766 


< — >^ Pho, Pfu, Tko, Tsi 


H/R 


Specific repetitive locus 


sRkB 809887 


810256 


216/343 


810048 


Cons 


PAB1794/PAB0571 




H 


sRkC 1612986 


1613347 


145/220 


1613195 


Cons 


PAB1080.4n/PAB1080.5n 




H 


The positions indicate the beginning and the end of each predicted region. The RNA lengths are based on Northern blot analysis. The 5' ends of ncRNAs were 
determined by primer extension experiments (Additional file 3, Figure S2). Promotors were predicted based on search of entire (Cons: BRE/TATA) or partial (TATA) 



motif consensus located 15 to 30 nt (otherwise pointed out in parenthesis) upstream of the 5' transcription start. Conservation among the 7 annotated 
thermococcal genomes: P. abyssi, P. horikoshii (Pho), P. furiosus (Pfu), T. kodokarensis (Tko), T. sibiricus(Tsi), T. gammatolerans and T. onnurineus. Prediction tools are 
denoted for each region: bias composition (H) and comparative analysis (R). ^ CRISPR 1 and 4 direct repeats are conserved. * sRk33 is annotated as SscA in P. 
furiosus [22]. 
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A 



hyp. protein 
PAB1170 



IF-2 



Cr4-4 Cr4-12 



CRISPR 4 

(1760062-1761654) 



< -J ^ 

CRISPR 3 

(1090059-1090356) 



ABC transporter 
operon 



PABt46 



(147916-149430) 

CRISPR 1 



■■■■ l U ■■■■■■ m il ii j 



Crl-l 



Pat02 Pat03 




CRISPR-related operon 
PAB1613/1612/1612.1n 



(317508-318110) 

,.CRISPR-associated gene 
PAB2143 



(482332-482830) 

CRISPR 2 



Citrate cycle operon 
> 



SRP Receptor operon 



(978707-984864) 

CRISPR-associated operon 
PAB 1 69 1/90/89/88/87/86/85 



500 bp 



B 



C 



Crl-l Cr4-4 Cr4-12 

MEESS MEESS MEESS 






1 








m 


100 

88 


m 


<^ 66 

48 
42 




24 





CRISPR 1 



AATA 



K^TTCCAATAAGACTAAAATAGAATTGAAAq 
GAAAAGCTTATAA 

ii (A)ir DRl spacer 1 DR2 spacer 2 



-10 



CRISPR 4 GAAACCCTTATAA 
AATA +1 



CRISPR 3 



+30 +45 



(A)5 DRl 



+75 



+110 



+145 



+30 +45 +75 



DR2 spacer 2 

+110 +145 



CRISPR 2 



TTTAATA (A)5 DRl 



spacer 1 DR2 spacer 2 



TTATATAAA 

ETTT CCGTAG A A CTTAGTAG T G T GG A A A G l 



Figure 1 Expression of P. abyssi CRISPR cassettes. (A) Genomic locations of P. abyssi CRISPR loci. CRISPR cassettes and CRISPR-related operons 
encoding cos and cmr genes mapped on the circular P. obyssi chromosome. Orientations of CRISPR cassettes and adjacent genes are denoted. 
Depicted gene coordinates are based on the available completed genome of P. obyssi GE5. Direct repeats and degenerate direct repeats are 
indicated by black and grey boxes, respectively. Promoters upstream of CRISPR cassettes are denoted by arrows. Crl-l, Cr4-1 and Cr4-12-RNAs 
are indicated in red. (B) Stable transcripts detected by Northern blot from CRISPR 1 and CRISPR 4 in exponential growth phase (E), entry into 
stationary phase (ES) or stationary phase (S). Arrows on the right indicate discrete RNA transcripts. The 5' end-labeled digest of Oxl74 DNA with 
Hinfl served as length nucleotide marker (M). The box C/D guide RNA, sR26, served as a loading control (bottom panel). (C) Detailed features of 
the CRISPR leader sequences drawn to scale with the direct repeat sequence of the two CRISPR families. Positions are relative to the 
transcription start (+1) of the CRIPSR precursor (preCr) indicated by a red arrow. The palindromic sequences in the direct repeat (DR) are 
represented by two inverted arrows. The 5' end of CRl-1 is indicated by a red arrow in the DRl sequence. The -10 and +30 boxes indicate BRE/ 
TATA promoter sequences and a conserved 5 nt poly A sequence, respectively. 



derived RNAs, named hereafter Crl-l, Cr4-1 and Cr4- 
12, are of the size reported for crRNAs corresponding 
to a spacer sequence and part of the direct repeat 
[16,24]. Moreover, primer extensions with total RNA 
extracted from cells in entry-stationary phase (Addi- 
tional file 3, Figure S2A) accurately identified their 5' 



ends (Figure IC), which correspond to the dicing site 
reported for the P,furiosus endoribonuclease Cas 6 [42]. 
Since only single reverse transcription arrests are 
observed (Additional file 3, Figure S2A) even though 
multiple Northern blot signals are detected (Figure IB), 
we propose that length heterogeneity results from 
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partial or incomplete 3'end processing. In conclusion, 
the CRISPR 1 and 4 cassettes are transcribed to produce 
small crRNAs in P. abyssi. No signals corresponding to 
precursor or other intermediates were detected by 
Northern hybridization (Figure IB), Nevertheless, primer 
extensions with oligonucleotides hybridizing to the 5' 
junction of their respective first direct repeat resulted in 
the detection of an RNA precursor and thus permitted 
the identification of the transcription starts of the 
CRISPR 1 and 4 RNA precursors (preCrl and 4, respec- 
tively) (Figure IC & Additional file SFigure S2A). Similar 
experiments with specific primers of CRISPR 2 and 3 
did not reveal any RNA precursor confirming that 
CRISPR 2 and 3 are not expressed. Signals were not 
detected in Northern hybridization with probes against 
the ultimate spacer 22 of CRISPR 1 and spacer 28 of 
CRISPR 4, which harbor degenerate repeats. These 
results suggest that the direct repeats are important for 
the processing and/or stability of the mature crRNAs. 
To identify transcription signals, we analyzed the 200 bp 
region upstream of the first direct repeat of each P. 
abyssi CRISPR cassette. This analysis recognized con- 
served AT rich motifs arguing for the presence of pro- 
moters. The CRISPR 2 leader sequence has three AT- 
rich and one A-rich element upstream of the first direct 
repeat, not corresponding to any typical P. abyssi con- 
sensus promoter sequence as defined in [38]. In con- 
trast, the CRISPR 1 and 4 cassettes have consensus 
promoter sequences 10 nt upstream of the experimen- 
tally determined 5' starts of preCrl and 4 (Figure IC). 
Two additional short elements at position +30 and -100 
relative to the transcription starts were detected in 
CRISPR 1 and 4. These sequences are also observed in 
other thermococcal CRISPR cassettes at the same dis- 
tance from the first direct repeat (data not shown). It 
should be noted that the general organization of the 
CRISPR 3 leader is similar to those of CRISPR 1 and 4 
except for the distance between the +25 A-stretch and 
the first direct repeat (Figure IC). This feature might 
account for the absence of transcription of the CRISPR 
3 locus. 

Expression of unique intergenic regions conserved 
among thermococcal genomes 

Four ncRNAs were clustered in this category (Table 1). 
For three of them, sRkll, sRk33 and sRk61, expression is 
detected at about the same intensity in all growth condi- 
tions whereas sRk28 is clearly growth phase-regulated 
with little or no detectable RNA in stationary phase 
(Figure 2A). Remarkably, the genomic contexts of sRkll, 
sRk28 and sRk33 are similar, all being flanked by ORFs 
transcribed in the same direction. Both, sRk33 and sRk61 
cover predicted promoter sequences of the downstream 
annotated genes (Figure 2B). The respective 5' ends 



(Table 1) determined by primer extension experiments 
(Additional file 3, Figure S2B) are at an acceptable dis- 
tance from the predicted promoter elements (BRE/ 
TATA and TATA boxes) for sRkll and sRk33, and over- 
lap a predicted TATA box for sRk61 (Figure 2B). No 
such sequence is found upstream of sRk28 which might 
suggest that its transcription depends on the upstream 
gene PAB1991 since this region is predicted to form an 
operon that includes PAB1992. Except for sRk61, these 
ncRNAs are extremely well conserved in sequence and 
structure throughout other thermococcal genomes 
(Additional file 4, Figure S3A). The proposed RNA sec- 
ondary structures, based on RNAfold and multiple align- 
ment covariation analysis, highlight compensatory 
variations that maintain hairpin structures and, for 
sRk28, a potential pseudoknot structure (Figure 2C). 
Only sRk33 was predicted and validated in a previous 
comparative analysis in P,furiosus (referred to as SscA in 
[19]). sRk33 has a predicted secondary structure with a 
basal PI -stem with a high proportion of co- variations, an 
apical P2 stem conserved in sequence, and a 3' end 
sequence composed of A and U rich stretches that do 
not fold into a stable secondary structure (Figure 2C). 
Altogether, these characteristics suggest that sRkll, 
sRk28 and sRk33 might have conserved cellular functions 
in the Thermococcales. 

Expression of repetitive intergenic regions conserved 
among thermococcal genomes 

Among the three ncRNAs clustered in this category 
(Table 1), sRk49 shows sequence similarities with two 
additional regions in P. abyssi (49.2 and 49.3) and three 
in P. horikoshii (Additional file 5, Figure S4A). Sequence 
similarities preclude the design of specific probes to dis- 
tinguish between sRk49 and 49.2. Northern blot signals 
provided evidence for a growth stage-specific transcrip- 
tion pattern with the most abundant species migrating 
about 150 nt. These transcripts could arise from either 
of the loci harboring an upstream TATA-like promoter 
(sRk49 and 49.2) (Additional file 5, Figure S4B & C). 
CR-RT-PCR experiments failed to detect primary tran- 
scripts, but suggested that antisense transcripts are 
expressed from the sRk49 operon (data not shown). 
Specific RNA structures or RNA modifications could 
have impeded the accurate amplification of the primary 
transcripts. Moreover, related sequences in the P, hori- 
koshii genome are all clustered with CRISPR cassettes 
showing an organization comparable to that observed 
for the 49.2 region. All sRk49 variants are atypical in 
that they contain multiple short polyA stretches making 
RNA folding into stable helical structures implausible. 

The two other ncRNAs of this category are sRk48 and 
sRk52 (Table 1, Figure 3). They were identified indepen- 
dently in our initial computational search and are part 
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Figure 2 Four ncRNAs from unique intergenic region conserved among thermococcal genomes. (A) Transcripts detected by Northern blot 
from regions with a unique locus in the P. obyssi genome. Source of RNAs, markers and sR26 loading control are as indicated in Figure IB (B) 
Gene maps drawn to scale. Transcripts and predicted genes are symbolized by red and grey arrows, respectively. Consensus promoters and 
TATA-box sequences (as defined in Materials and Methods) are indicated in black and grey, respectively. The 5' start of each novel RNA was 
experimentally determined (Figure S2). (C) Secondary structure models with conserved nucleotides indicated by A, C, G and U, variable 
nucleotides conserving pairing by circled dots and variable nucleotides by dots. A putative pseudoknot in sRk28 is denoted by a dotted line. The 
underlined sequence in sRk33 indicates the putative promoter of PAB0465. 



of a set of six similar genomic sequences in P, abyssi. 
The similarities extend to all thermococcal genomes 
with four, three, two and one copies in P, furiosus, P, 
horikoshii, T, sibiricus and T, kodokaraensis, respectively 
(Additional file 6, Figure S5). The sequence similarities 
precluded the design of specific probes for each locus. 
Therefore, the signals detected in the Northern blot in 
Figure 3A could arise from any of the six repeated 



genomic regions. In order to identify the transcribed 
regions, a CR-RT-PCR experiment was performed. Most 
clones corresponded to the 130 nt transcript with a 
majority matching sRk52 and a minority sRk48 (Figure 
3B). They shared similar 5' ends and variable 3' ends. In 
addition, a small percentage of clones corresponded to 
longer variants of sRk52 (177/180 nt) mapping a tran- 
scription initiation site 22 nt downstream a predicted 
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BRE/TATA box promoter (Additional file 5, Figure S4). 
The shorter 5'-monophosphorylated sRk52 transcript is 
likely to have arisen from maturation of the longer pri- 
mary transcript. Interestingly, RNAfold and multiple 
sequence alignments suggest a highly conserved RNA 
secondary structure within the Thermococcales (Figure 
4A). Many nucleotides, particularly in the PO and P3 
stems, support co-variations arguing for structural con- 
servation. The majority of the cloned CR-RT-PCR frag- 
ments contain sequences corresponding to the PI, P2 
and P3 stem-loops whereas a minority includes the PO 



stem-loop in their 5' part (Figure 4A). The PI stem-loop 
is unusual since its primary sequence is highly con- 
served throughout the Thermococcales giving this struc- 
ture not only a high propensity to form, but also 
suggesting an important cellular function. Inspection of 
the P2-loop of sRk52 suggested that it forms a K-loop, 
which is consistent with the gelretardation of 5'end- 
labeled sRk52-transcripts in presence of the recombi- 
nant P. abyssi L7Ae protein (Figure 4B). The P3-loop is 
variable and can be extended into a stem structure for 
the two P, abyssi sRk48-like sequences. 
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Expression of repetitive intergenic regions specific to 
P. abyssi 

Amounts of the two ncRNAs of this category, sRkB and 
sRkC, were highest upon entry into stationary phase 
(Figure 5A; Table 1). Because these loci (Figure 5B) 
have significant sequence similarities to four other 
regions dispersed in the P, abyssi genome, we performed 
additional experiments to determine if the other loci 
were transcribed. Despite the sequence similarities, we 
could design oligonucleotide probes specific for each 
repetitive locus and perform additional Northern blot 
probing (data not shown), CR-RT-PCR experiments 
(Figure 5B and 5C) and primer extension analyses 
(Additional file 3, Figure S2B). From these data, the 
sRkB and sRkC loci were shown to be the only regions 
transcribed. In order to identify the 5' and 3' ends of 
these transcripts, three distinct CR-RT-PCR experiments 
were carried out. One pair of primers specific of the 
sRkB locus (Rl) and two pairs of primers specific of the 
sRkC locus (R2 and R3) were chosen to characterize the 
overall RNAs expressed at these loci. The majority of 
the Rl clones correspond to transcripts from the sRkB 
locus harboring an identical 5'end and a variable 3'end 
of which half ended between position 209 to 218 nt and 
half between position 340 to 343 nt (Figure 5B and 
Additional file 4, Figure S3B). The length of sRkB RNAs 
as determined by CR-RT-PCR, corresponds to the sizes 
of the transcripts detected in Figure 5A. Moreover, 
sequence analysis revealed two typical promoter consen- 
sus sequences 24 nt and 179 nt upstream of the experi- 
mentally determined 5' end (Figure 5B and Additional 



file 4Figure S3B). Interestingly the 3' extremities of all 
the RNA species are in the first 25 to 150 nucleotides of 
the PAB0571 ORF. Additional primer extension analysis 
with a specific probe within PAB0571 suggests that a 
unique RNA precursor is transcribed from this locus 
and subsequent processing events including 3' end trim- 
ming generate the different sRkB RNA species (data not 
shown). The R2 and R3 CR-RT-PCR analyses permitted 
the identification of three stable RNA species from the 
sRkC locus (Figure 5B). The two longer RNA species of 
roughly 137 nt (majority of the sequenced R2 clones) 
and 211 nt (half of sequenced R3 clones) share exactly 
the same 5' extremity. The 5' ends of the shorter 64/66 
nt RNA species are positioned 149 nt downstream (min- 
ority of sequenced R3 clones). In addition, primary tran- 
scripts were distinguished from processed ones by 
omission of treatment of the RNA preparations with 
tobacco acid pyrophosphatase (TAP) (Figure 5C). The 5' 
ends of 211/18 nt-RNA species are mostly triphosphory- 
lated. As in the case of sRkB, a promoter sequence is 
found at an acceptable distance (24 nt) from the 5' end 
(Figure 5B and Additional file 4Figure S3B). RNA spe- 
cies transcribed from the sRkC locus could result from 
5' and 3' end processing events of the primary 211/218 
nt long transcripts resulting in precise 5' monopho- 
sphorylated ends and variable 3' ends. However, it 
should be noted that the 64/66nt RNA species were not 
detected in Northern blot analysis with specific probes 
(data not shown) suggesting that they are in low abun- 
dance in our RNA preparations. In contrast to the sRkB 
locus, the sRkC locus is located in a long intergenic 
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region. The 3' end of PAB1080.5n, the nearest ORF, is 
located approximately 400 bp upstream of the 5' end of 
sRkC (Figure 5B). 

As mentioned earlier, the sRkB and sRkC loci are 
part of a set of six similar sequences that had no coun- 
terparts in other thermococcal genomes and are there- 
fore specific to P, abyssi. Consistent with their lack of 
expression, the BRE/TATA promoter sequences are 
missing in the other four other P. abyssi loci (Addi- 
tional file 4, Figure S3B). Strikingly, one of these loci 
corresponds to the 3' end of the PAB1452 open read- 
ing frame. This ORF has been identified as a 'single 
OrfB element' of the IS60S/IS200 family of transpo- 
sases [43]. 



Interestingly, in our initial computational characteriza- 
tion, sRkB and sRkC were predicted to fold into a struc- 
ture with a K-turn RNA motif. To test if sRkB and 
sRkC form a K-turn, we performed gel retardation 
assays with recombinant P. abyssi L7Ae (Figure 6A). 
This assay reveals at least two distinct ribonucleoprotein 
complexes (RNPs) suggesting that more than one L7Ae 
protein binds to the sRkB and sRkC RNAs. To elucidate 
these RNA-protein interactions, footprint experiments 
using RNase Tl protection were performed with radio- 
actively labeled 216 nt sRkB transcript (Figure 6B). Two 
sets of clustered guanosines are protected from RNase 
Tl digestion in the presence of L7Ae, suggesting K- 
turns in stems PI (KTi) and P4 (KT2) (Figure 6B). 
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Further support for the existence of KT^ and KT2 was 
obtained by mutations of the sheared GA base pairs, 
which are the hallmark of the K-turn. In this case, gel 
retardation assays showed that the shift with recombi- 
nant P. abyssi L7Ae was abolished (data not shown). 
Sequence alignment (Additional file 4, Figure S3B) and 
RNA-fold predictions support the RNA secondary struc- 
ture shown in Figure 6C. Altogether, our data reveal 
that several transcripts expressed from the sRkB and 
sRkC loci have K-turn motifs that might be important 
for their cellular function. Another well-known RNA 
structural motif, a loop E [39], might form in the upper 
part of P5. However, since there is no specific ligand 
that interacts with loop E, validation of this fold requires 
a detailed structural analysis. It is interesting to note 
that the initiation codon for the PAB0571 is part of the 
P5 stem of sRkB suggesting a functional link with the 
expression of the downstream PAB0571. 

Discussion 

Discovery of novel ncRNAs by combined bioinformatic 
approaches 

The rationale for carrying out a search of ncRNAs in 
P. abyssiy the best studied Thermococcale, was that with 
the exception of the box H/ACA [13] and the box C/D 
[12] noncoding guide RNA families, few thermococcal 
ncRNA families have been described. Our study pro- 
vides evidence for the growth- regulated expression of 12 
novel ncRNAs in the P, abyssi genome consisting of 
three crRNAs from CRISPR loci, four ncRNAs from 
unique loci conserved throughout Thermococcales of 
which one appears to be a homologue of the P. furiosus 
SccA RNA [19], three ncRNAs from repetitive loci con- 
served throughout Thermococcales, and two ncRNAs 
from repetitive loci specific to P. abyssi. Since small pro- 
teins (sproteins) of less than 50 amino acids have been 
largely disregarded in genome annotations [44], we can- 
not exclude the possibility that the ncRNAs identified 
here encode sproteins or peptides. We therefore 
searched for open reading frames within the ncRNA 
sequences starting with AUG, GUG or UUG, and end- 
ing with UGA, UAG or UAA. For several of the 
ncRNAs, it was possible to find small ORFs of 25 to 40 
amino acids. The only significant ORF identified corre- 
sponds to the distal portion of a putative transposase 
CDS (see below). We also analyzed the GC content of 
the ncRNAs. We found that it ranges from 33% to 66%. 
This large spectrum is consistent with the GC content 
for known ncRNAs (low for C/D guide RNAs, high for 
tRNAs and rRNAs). Since the computational analyses 
were restricted to intergenic regions, no c/5-encoded 
antisense ncRNA complementary to ORFs, as identified 
in transcriptome analyses of Sulfolobus solfataricus and 
Methanosarcina mazei [22,45] could be predicted. 



Below, experimental and genomic features of the new P. 
abyssi ncRNA families are discussed. 

Processing of ncRNAs 

In this study, transcripts of different length were identi- 
fied at one locus suggesting that RNA processing occurs 
in P, abyssi. The sRkB, sRkC, sRk48 and sRk52 primary 
transcripts with triphoshorylated 5' ends are processed 
into shorter 5' end monophosphorylated transcripts 
through maturation reactions (Figure 3C and Figure 
5C). For example, the 3' end heterogeneities observed 
for sRkB, sRkC, sRk48 and sRk52 RNAs could arise 
from trimming of the primary transcript by the 3' to 5' 
exonucleolytic activity carried by the Pyrococcus exo- 
some [46]. The 5' monophosphorylated RNAs could 
result from endonucleolytic or pyrophosphohydrolytic 
activity. Previous examples of processing in Pyrococcus 
were limited to tRNA processing by RNase P [47] and 
crRNA maturation by the Cas 6 endonuclease [42,48]. 
Recent results suggest that the 5' to 3' exonucleolytic 
activity of RNase J homologues may also be involved in 
RNA processing in Euryarchaea and Crenarchaea 
[49,50]. 

CRISPR-derived RNAs 

Our detection of crRNAs is the first demonstration of the 
existence of an active CRISPR defense system in P, abyssi. 
The CRISPR loci 1 and 4 appear to be expressed inde- 
pendently of growth phase in our experimental condi- 
tions. In agreement with the findings in P, furiosus [51], 
no antisense transcription of these CRISPR loci was 
detected (data not shown) excluding the possibility that 
double strand RNA intermediates are formed. This con- 
trasts with the report that RNA is transcribed from the 
complementary spacer strand in S, acidocaldaricus [52]. 
This difference could reflect a specific feature of the Sul- 
folobales or a technical issue regarding the sensitivity of 
the Northern blots analysis used in the studies of P, furio- 
sus and P, abyssi. Based on our data, we propose a new 
annotation for the four P, abyssi CRISPR arrays that dif- 
fers from earlier proposals (UCSC archaeal genome 
browser and [40]). The presence of a typical consensual 
promoter in the leader sequences of CRISPR 1 and 
CRISPR 4 correlates with the detection of CRISPR- 
derived RNAs from these loci. It has been reported that 
CRISPR cassettes expressing crRNAs provide acquired 
immunity [28,53,54]. It should be noted that short 
regions of spacers 7 and 19 of CRISPR 1 (14 bp and 17 
bp, respectively) are complementary to two coding 
regions of PA VI, a virus that infects P, abyssi (data not 
shown) [55]. No other similarities were observed between 
CRISPR 1 and 4 spacer sequences and known P, abyssi 
mobile genetic elements [56,57]. The silent CRISPR 2 
and 3 loci differ from the expressed cassettes in several 
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ways. First, the CRISPR 2 locus has divergent direct 
repeat and leader sequences. The CRIPSR 3 locus is aty- 
pical by its leader sequence, the reduced number of 
repeats, and the sequence degeneracy of its repeats, sug- 
gesting that it is a relic of an active CRISPR cassette as 
mentioned in [27,28,58]. In general, CRISPR cassettes are 
physically linked to a cohort of conserved cas or cmr 
genes, in varying orientation and order, that encode 
CRISPR-associated proteins (reviewed in [24,27,31]). It is 
interesting to note that the only cas gene (Cas6, 
PAB1613) in the vicinity of P. abyssi CRISPR cassettes is 
located upstream of CRISPR 3 (Figure lA). In the 
P, abyssi genome, all the other genes encoding Cas/Cmr 
core proteins are grouped into an operon located far 
from any CRISPR locus (Figure lA). Further studies will 
be required to elucidate the mode of action and the 
dynamics of the CRISPR/Cas system in P, abyssi. 

S' UTR-derived ncRNAs 

Distinct RNA species are not always indicative of inde- 
pendently synthesized RNAs. As revealed by RACE 
experiments and analysis of genomic context, several 
ncRNAs (Table 1) seem to derive from mRNA leaders. 
This class of ncRNAs appears to originate from matura- 
tion of longer transcripts, as we demonstrated for sRkB 
and propose for sRk28, sRk33 and sRk61. Stable RNAs 
derived from RNA leaders were initially identified in 
E, coli and some appear to correspond to riboswitch ele- 
ments such as RFN and THI [59]. It is now well estab- 
lished that 5'UTRs can encompass transcription- 
termination signals involving riboswitch structures 
[60,61]. In Archaea, it has recently been suggested that 
riboswitches may also exist. Based on comparative geno- 
mics, crcB has been found in Archaea and Bacteria, and 
TPP and flp in Euryarchaea [60,62]. The crcB element 
was found within our set of 82 candidates (Additional 
file 2, Table SI), but ncRNA corresponding to this ele- 
ment was not detected by Northern blotting under the 
conditions employed in this study. To date, no biological 
evidence showing gene regulation by riboswitches has 
been reported in Archaea. Nevertheless, the conserved 
70 nt sRk28 shares several feature with the bacterial 
preQl riboswitch [63,64]. PAB 1992, adjacent to sRk28, 
is predicted to encode an ATPase with a queosine 
synthesis-like protein domain (QueC). The growth-regu- 
lated sRkB RNA is transcribed from a locus adjacent to 
PAB0571, which encodes a putative ATPase. In bacteria, 
the expression of several ATPase-like proteins is con- 
trolled by c/5-mechanisms involving the SAM-I ribos- 
witch. It is interesting to note in Listeria monocytogenes 
that this type of riboswitch also acts in trans as a small 
ncRNA to regulate expression of the virulence factor 
PrfA [7]. Moreover, our data strongly suggest that sRkB 
has the propensity to form K-turn structures, which are 



found to be functional RNA motifs involved in the 
assembly of ribonucleoprotein complexes and in the 
orientation of pseudoknot interactions in the aptamer 
structures of the SAM-I and lysine riboswitches [36,37]. 
Therefore, sRkB could have a specific function in the 
P, abyssi cell that involves the regulation of the expres- 
sion of the putative ATPase encoded by PAB0571. We 
speculate that sRk28 and sRkB may be representative of 
a novel family of archaeal ncRNAs produced by tran- 
scription attenuation or RNA processing. Finally, the 
promoter-associated sRk33 and sRk61 could be related 
to the recently discovered ncRNAs associated with tran- 
scription starts sites reported in higher eukaryotes 
[65,66], in yeast [67,68] and in Salmonella enterica sero- 
var typhi [69]. This class of ncRNAs is thought to inter- 
fere with transcription of the downstream promoter by 
c/5-mediated occlusion or by a ^mw5-mechanism. 

Transposase-related ncRNAs 

In this study, sRk48, sRk52, sRkB and sRkC ncRNAs, 
which are transcribed from repetitive loci, were shown to 
be related to a CDS annotated as an orphan OrfB element 
of the IS607 and IS200I60S family, [43]. For example, in 
the case of sRk48 and sRk52, similar sequences corre- 
spond to 3' end coding regions annotated as partial OrfB- 
like IS607 transposase in P.furiosus and T, sibiricus. In 
bacteria, it has been shown that IS 10 transposition events 
are controlled by a complementary c/5-encoded antisense 
ncRNA [70], but this seems unrelated to our observations 
since the ncRNA sequences matched the sense sequence 
of the 3' end of the annotated ORFs. However, it is perti- 
nent to note that a link between ncRNAs and transposase 
ORFs was already observed in Sulfolobus solfataricus 
[16,17] and in Salmonella [71]. Moreover, in Sulfolobus 
solfataricus [17] it was shown that several small RNAs 
linked to annotated transposase ORFs could bind the mul- 
tifunctional ribosomal protein L7Ae through recognition 
of an RNA kink turn as is the case for sRkB, sRkC and 
sRk52. 

Conclusions 

Two novel P, abyssi ncRNA families, the CRIPSR and the 
5' UTR-related ncRNAs are described in this study. We 
certainly missed other P. abyssi ncRNA families because 
the biased composition screen only identifies GC rich 
ncRNAs. This is the case for the box C/D guide RNAs 
with low GC content. Nevertheless, the high number of 
known ncRNAs including the box H/ACA RNAs found 
by this approach confirms its relevance in searching for 
ncRNAs in AT-rich genomes. A limitation to this search, 
which is a bias due to misannotated ORFs and therefore 
intergenic regions, could not be excluded since sequence 
signals for archaeal transcription and translation are not 
as well defined as their counterparts in Bacteria. The 
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discovery of novel ncRNAs by our combined computa- 
tional approaches emphasizes the potential diversity of 
ncRNAs in Archaea, which could be enlarged by global 
RNomic approaches such as RNAseq technology. This 
would provide a deeper insight into the P, abyssi ncRNA 
world and help improve our knowledge of the specific 
roles of P, abyssi ncRNAs and their relationship to the 
5'UTRs described in this study. 

Methods 

Computational search 

Genome sequences and related annotation files (gbk, . 
fna and .ptt extensions) were downloaded from the 
NCBI ftp database for P. abyssi, P, furiosus, P, horikoshii 
and T, kodakaraensis genomes. For these four genomes, 
a comparative analysis of all intergenic sequences was 
realized using RNAsim as described in [2]. RNAsim 
searches for conserved sequences and structured regions 
between different genomes by using Wu-blast 2.0 to 
select pairwise alignments including conserved 
sequences (here with W = 7, E < 0.0001) and QRNA 
[72] to identify in pairwise alignments base substitution 
patterns that could correspond to a conserved RNA sec- 
ondary structure. In a final step, RNAsim combines this 
information to predict loci that are conserved in multi- 
ple genomes. In this analysis, all the regions encoding 
annotated ncRNAs except for the box H/ACA RNA 
genes were excluded from the set of intergenic regions 
(46 tRNAs, 1 rRNA operon, 2 5S rRNAs, 59 C/D guide 
RNAs, 1 RNase P RNA and 1 7S/SRP RNA). GC-rich 
regions were predicted in P. abyssi by using the same 
segmentation approach as previously used by Klein and 
colleagues to predict new ncRNAs in P, furiosus [19]. 
However, our approach differed in that the transition 
probabilities were adjusted to enrich the number of can- 
didate regions. Additional sequence comparisons were 
performed on putative ncRNA candidates with Blastn 
(default values and W = 7) to add to this initial set 
other P, abyssi and archaeal homolog sequences. BRE/ 
TATA, consensus boxes 5'_RNNANNTTTAWA_3' and 
5'_RAAANNTWWWWA_3', and TATA consensus box 
5'_TTWWWWA_3', K-turn and K-loop consensus 
motifs, inferred from the literature [34,38,39,73], were 
identified using Patscan [74]. Regions of favorable free 
energy were computed by setting the free energy thresh- 
old such that all tRNAs were found in sliding windows 
of 150 bp (here threshold=-32.3). Only regions of free 
energy below the threshold and longer than 50 bp were 
displayed in ApoUoRNA, an extension of the annotation 
environment Apollo [75], developed to support ncRNA 
analysis. Highly structured hairpins with minimal hair- 
pin stems of 6 bp (including G-U and U-G pairs) were 
searched with Patscan. RNA secondary structures were 
proposed on the basis of multiple alignments of similar 



regions within the Pyrococcales using Multalin [76] that 
were improved manually by combining RNAfold predic- 
tions and covariations. All data including predictions, 
motifs and annotations were integrated and visualized in 
ApolloRNA. 

Oligonucleotides used in this work 

Additional file 7 Table S2 list the primers used for 
Northern blot detection, CR-RT-PCR, primer extension, 
and the preparation of transcription templates by PGR. 

Preparation of total cellular RNA 

P. abyssi strain GE5 cells were grown as described in [77] 
at 95°C in Vent Sulfothermophiles Medium. P. abyssi 
cells were stopped at three different growth phases: expo- 
nential, end of exponential and stationary phases. The 
exponential and stationary phase cell paste was provided 
by A. Hecker (Institut de Genetique et de Microbiologie, 
Paris Sud-Orsay). Entry into stationary phase cell paste 
was purchased (Reims University, France). Total RNA 
was prepared from P, abyssi cell paste by Trizol extrac- 
tion followed by treatments with DNase RQl (RNA-free, 
Promega), proteinase K and phenol/chloroform extrac- 
tion followed by ethanol precipitation. 

Northern blotting analysis 

Total RNA (10 (ig)extracted from cells in exponential 
growth phase (E), entry into stationary phase (ES) or sta- 
tionary phase (S), and a 5' [^^P] -end-labeled denatured 
PhiX174/HinfI DNA ladder were separated on a denatur- 
ing 6% polyacrylamide gel (8M urea, 1 x TBE buffer). 
Gels were transferred onto Hybond N+ nylon membrane 
using a Transphor Power Lid (Hoefer) apparatus in 0.5 x 
TBE buffer. The RNAs were UV cross-linked to the 
membrane (1200 J/cm^). Prehybridization was carried out 
for 30 minutes at 50°C in hybridization buffer (5 x SSC, 
1 X Denhartd's solution, 1% SDS, 0.05 mg/mL sperm 
DNA herring). DNA oligonucleotides were designed 
using Primer designer or Vector NTI and 5' end labeled 
with [y-32P] ATP and polynucleotide kinase. Hybridiza- 
tions were carried out at 50°C for 16 h followed by two 
washes in 0.1 x SSC, 0.1% SDS buffer at room tempera- 
ture for 10 min. The blots were analyzed by phosphori- 
maging (Molecular Dynamics) or autoradiography using 
MR or MS film (Kodak). 

Primer extension and Circular RACE (CR-RT-PCR) 

Primer extension analysis was performed using 10 \ig of 
total RNA prepared from P. abyssi cells in entry into 
stationary phase. Total RNA was reverse transcribed at 
42°C by AMY reverse transcriptase (Promega) using a 5' 
end labeled specific primer. CR-RT-PCR was performed 
with 20 (ig of total RNA prepared from P. abyssi cells in 
entry into stationary phase, treated with (+) or without 
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(-) 25U of Tobacco Acid Pyrophosphatase (TAP) 
according to manufacturer's protocol (Epicentre Bio- 
technologies). RNAs were extracted with phenol/chloro- 
form then precipitated with ethanoL RNA (1 (ig) +/- 
TAP was circularized with 40U of T4 RNA ligase (New 
England Biolabs), extracted with phenol/chloroform, 
ethanol precipitated and reverse transcribed by AMV 
reverse transcriptase using specific primers. After etha- 
nol precipitation, the reverse transcripts were PGR 
amplified using Crimson Taq (New England Biolabs) 
and appropriate primers. The products were separated 
on a 6% native polyacrylamide gel (1% glycerol, 0.5 x 
TBE), treated with TAP, cloned in pCR®II-TOPO® with 
TOPO-TA Cloning® Kit according to the manufac- 
turer's instructions (Invitrogen) and sequenced by Euro- 
fins MWG Operon. About 100 sequences were analyzed 
for each CR-RT-PCR experiment. 

In vitro transcription 

A portion of the intergenic region corresponding to 
sRk52, sRkB and sRkC, respectively, was amplified by 
PCR from P. abyssi genomic DNA using specific primers 
(Additional file 2, Table SI). PCR products served as 
templates for in vitro transcription using T7 RNA poly- 
merase as previously described [34]. 

Electrophoretic mobility shift assay (EMSA) and enzymatic 
structural probing 

EMSA was performed using Exoli tRNA as a non-speci- 
fic competitor as previously described [34]. RNA and 
RNP complexes were separated on a native 6% (19:1) 
polyacrylamide gel containing 0.5 x TBE and 5% gly- 
cerol. Electrophoresis was performed at room tempera- 
ture at 250 V in 0.5 x TBE running buffer containing 
5% glycerol. The gels were dried and visualized using a 
Fuji-Bas 1000 phosphorlmager. 

Additional material 



Additional file 1: Figure 51; Strategy for r^cRNA candidate predictions. 
An initial set of candidates resulted from two complementary prediction 
methods. The bias composition selected 73 regions within 67 IGRs. The 
predicted regions expressing known ncRNAs (except for the H/ACA 
sRNA) were removed from the candidate pool, which reduce the 
number of candidates to 22 regions. The comparative analysis selected 
106 regions within 95 IGRs. Based on the quality of the sequence 
alignments, we kept 65 of them for further analysis. Within the 73 
candidates found by both approaches, 14 regions were common. Finally 
the comparison of these 73 regions using BlastN (W = 7) against the P. 
abyssi genome itself allowed the identification of nine additional regions 
corresponding to genomic repeats. 

Additional file 2: Table SI; List of the 82 predicted regions from the 
combined computational screens. 

Additional file 3: Figure S2; Primer extension experiments on total 
RNAs extracted from cells in entry in stationary phase. Length marker (M) 
corresponds to the sequence (T) of sRkB locus amplified from oligo 
sRkB_F (Additional file 7, Table S2) with the Thermo sequenase cycle 



sequencing kit from USB. (A) Primers matching pre-Cr and crRNAs. 
Reverse transcription arrests are denoted by small arrows. Direct repeats 
of CRISPR loci are highlighted in grey. (B) Primers matching ncRNAs as 
indicated. 

Additional file 4: Figure S3; (A) Sequence alignment of sRkl 1, sRk28, 
sRk33 and sRk61 related loci within the thermococcal genomes. The 5' 
and 3' ends of ncRNAs are specified. (B) Sequence alignment of the six 
loci related to sRkB/sRkC loci within the P. abyssi genome. Capital and 
small letters are for intergenic and ORF, respectively. The consensus BRE/ 
TATA box promoter sequences are underlined. The 5' (arrows) and 3' 
(brackets) ends from CR-RT-PCR products are indicated. Black nucleotides 
indicate conserved nucleotides within the transcribed regions and 
highlighted nucleotides specify sequence variations. Promoter sequences 
are underlined. Secondary structure features are symbolized on top of 
each sequences; «» for stems, (()) for pseudoknots and * for 
covariation or G->U/U->G mutations preserving base-pairing. 

Additional file 5: Figure S4; (A) Sequence alignment (as denoted in 
Additional file 4, Figure S3) of the three loci related to sRk49 locus found 
in the P. abyssi (sRk49, sR49.2 and sRk49.3) and P. horikoshii genomes. (B) 
Detection of sRk49 by Northern blotting. Source of RNAs, markers and 
sR26 control are as indicated in Figure IB. (C) Gene maps drawn to scale 
with transcripts depicted in red. 

Additional file 6: Figure S5; Sequence alignment (as denoted in 
Additional file 4, Figure S3) of the sixteen sequences related to sRk48/ 
sRk52 loci found in the P. abyssi (Pab), P. Iiorikoshii (Pho), P. furiosus (Pfu), 
T. sibiricus (Tsi) and T. l<odokarensis (Tko) genomes. 

Additional file 7: Table S2; List of oligonucleotides used in this study. 
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